-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIRRTL] Subword assignment support via rewriting to read-modify-write #3658
base: main
Are you sure you want to change the base?
Conversation
would be good to have some parser (parse-basic.fir) and flow checking tests (connect-errors.mlir). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a simple e2e test in test/firool/firtool.fir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more comments
I removed the recursion from |
; VERILOG-LABEL: module SubwordAssign( | ||
; VERILOG-NEXT: input c, | ||
; VERILOG-NEXT: output [3:0] out); | ||
; VERILOG: assign out = {2'h3, ~c, 1'h1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I am having a hard time parsing the MLIR-internal firrtl syntax, but as far as I can tell the following kinds of tests might be missing:
Ideally you would add the product of {Bundle field, Vec field, UInt, SInt} x {Port, Wire, Reg, Memory Port} as tests. Maybe there are some combinations that would make sense to skip. |
There is a test for subword assignment to a register. Bundles and Vecs have been lowered away by the time this pass runs, so I'm not sure if we need tests for subword assignments to those, but if so they would have to go in the end-to-end firtool test. I have a memory port test using FIRRTL memories that I could add (it would be a little large). In investigating the memory ports though I realized that there is an issue with subword assignment to CHIRRTL memory ports. The LowerCHIRRTL pass runs before this pass and is not able to handle a bits op on the left-hand-side. I'm not sure how to address this, since I am not familiar with CHIRRTL memories (is there documentation on how they work anywhere?). Currently the LowerCHIRRTL pass will give a flow error if a bits op is used for subword assignment. |
I am curious, with this PR, are you able to simplify:
to
?? |
CIRCT just treats invalid values as 0, so this gets generated for your example:
It would be cool if there were smarter invalid value optimization that could simplify this though! |
In general firrtl does some of these optimizations. (Even though, they can be a little bit problematic since they can break local invariants in the presence of invalid). So for example:
is compiled to
|
I decided to follow your lead and just plug in zeros for now. |
@zyedidia: What does your current implementation do with this circuit? Did you manage to make the combinatorial loop checker accept this?
Or this circuit which might be closer to something users might actually want to write:
|
This PR doesn't modify the combinational cycle checker, so those examples cause errors. If you disable the check with
and this for the second circuit:
I think the second one has a true combinational cycle (on |
Note: this isn't correct. CIRCT implements the same context-sensitive interpretation of invalid that the SFC uses. I had to do some excavation to discern what this is. It's documented here: https://circt.llvm.org/docs/Dialects/FIRRTL/RationaleFIRRTL/#interpretation-of-undefined-behavior. In test form: https://github.com/llvm/circt/blob/main/test/Dialect/FIRRTL/SFCTests/invalid-interpretations.fir The only known divergence from the SFC is around SFC optimizations that explicitly check for self-connects to registers. E.g., your example works as expected:
|
It shouldn't on the bit-level since
|
Ah yes that's right. |
For now I think it is fine that this is rejected by the comb loop checker. However, I have a feeling that once people start using sub-word assignments, they will start complaining about us rejecting this kind of code. |
Here is a simple example with a CHIRRTL memory:
Currently MFC throws a flow error on this: |
@zyedidia The flow checking is implemented as a verifier that is run after each pass. In this case, it is failing after the %8 = "firrtl.mem"() {annotations = [], depth = 1024 : i64, name = "mem", nameKind = #firrtl<name_kind interesting_name>, portAnnotations = [[]], portNames = ["MPORT"], readLatency = 1 : i32, ruw = 0 : i32, writeLatency = 1 : i32} : () -> !firrtl.bundle<addr: uint<10>, en: uint<1>, clk: clock, rdata flip: uint<8>, wmode: uint<1>, wdata: uint<8>, wmask: uint<1>>
%12 = "firrtl.subfield"(%8) {fieldIndex = 3 : i32} : (!firrtl.bundle<addr: uint<10>, en: uint<1>, clk: clock, rdata flip: uint<8>, wmode: uint<1>, wdata: uint<8>, wmask: uint<1>>) -> !firrtl.uint<8>
%16 = "firrtl.bits"(%12) {hi = 1 : i32, lo = 0 : i32} : (!firrtl.uint<8>) -> !firrtl.uint<2>
"firrtl.strictconnect"(%16, %19) : (!firrtl.uint<2>, !firrtl.uint<2>) -> () Here is a more verbose outputfirrtl.circuit "Ram" {
firrtl.module @Ram(in %clock: !firrtl.clock, in %reset: !firrtl.uint<1>, in %io: !firrtl.bundle<addr: uint<32>, wdata: uint<8>>) {
%0 = firrtl.subfield %io(1) : (!firrtl.bundle<addr: uint<32>, wdata: uint<8>>) -> !firrtl.uint<8>
%1 = firrtl.subfield %io(0) : (!firrtl.bundle<addr: uint<32>, wdata: uint<8>>) -> !firrtl.uint<32>
%mem = chirrtl.seqmem Undefined : !chirrtl.cmemory<uint<8>, 1024>
%MPORT_data, %MPORT_port = chirrtl.memoryport Write %mem {name = "MPORT"} : (!chirrtl.cmemory<uint<8>, 1024>) -> (!firrtl.uint<8>, !chirrtl.cmemoryport)
%2 = firrtl.bits %MPORT_data 1 to 0 : (!firrtl.uint<8>) -> !firrtl.uint<2>
%3 = firrtl.bits %1 9 to 0 : (!firrtl.uint<32>) -> !firrtl.uint<10>
%_T = firrtl.node %3 : !firrtl.uint<10>
chirrtl.memoryport.access %MPORT_port[%_T], %clock : !chirrtl.cmemoryport, !firrtl.uint<10>, !firrtl.clock
%4 = firrtl.tail %0, 6 : (!firrtl.uint<8>) -> !firrtl.uint<2>
firrtl.strictconnect %2, %4 : !firrtl.uint<2>
}
}
<stdin>:13:7: error: 'firrtl.strictconnect' op connect has invalid flow: the destination expression has source flow, expected sink or duplex flow
firrtl.strictconnect %2, %4 : !firrtl.uint<2>
^
<stdin>:13:7: note: see current operation: "firrtl.strictconnect"(%16, %19) : (!firrtl.uint<2>, !firrtl.uint<2>) -> ()
<stdin>:8:12: note: the destination was defined here
%2 = firrtl.bits %MPORT_data 1 to 0 : (!firrtl.uint<8>) -> !firrtl.uint<2>
^
// -----// IR Dump After LowerCHIRRTLPass Failed (firrtl-lower-chirrtl) //----- //
"firrtl.module"() ({
^bb0(%arg0: !firrtl.clock, %arg1: !firrtl.uint<1>, %arg2: !firrtl.bundle<addr: uint<32>, wdata: uint<8>>):
%0 = "firrtl.invalidvalue"() : () -> !firrtl.uint<1>
%1 = "firrtl.invalidvalue"() : () -> !firrtl.uint<8>
%2 = "firrtl.constant"() {value = 1 : ui1} : () -> !firrtl.uint<1>
%3 = "firrtl.invalidvalue"() : () -> !firrtl.clock
%4 = "firrtl.constant"() {value = 0 : ui1} : () -> !firrtl.uint<1>
%5 = "firrtl.invalidvalue"() : () -> !firrtl.uint<10>
%6 = "firrtl.subfield"(%arg2) {fieldIndex = 1 : i32} : (!firrtl.bundle<addr: uint<32>, wdata: uint<8>>) -> !firrtl.uint<8>
%7 = "firrtl.subfield"(%arg2) {fieldIndex = 0 : i32} : (!firrtl.bundle<addr: uint<32>, wdata: uint<8>>) -> !firrtl.uint<32>
%8 = "firrtl.mem"() {annotations = [], depth = 1024 : i64, name = "mem", nameKind = #firrtl<name_kind interesting_name>, portAnnotations = [[]], portNames = ["MPORT"], readLatency = 1 : i32, ruw = 0 : i32, writeLatency = 1 : i32} : () -> !firrtl.bundle<addr: uint<10>, en: uint<1>, clk: clock, rdata flip: uint<8>, wmode: uint<1>, wdata: uint<8>, wmask: uint<1>>
%9 = "firrtl.subfield"(%8) {fieldIndex = 0 : i32} : (!firrtl.bundle<addr: uint<10>, en: uint<1>, clk: clock, rdata flip: uint<8>, wmode: uint<1>, wdata: uint<8>, wmask: uint<1>>) -> !firrtl.uint<10>
"firrtl.strictconnect"(%9, %5) : (!firrtl.uint<10>, !firrtl.uint<10>) -> ()
%10 = "firrtl.subfield"(%8) {fieldIndex = 1 : i32} : (!firrtl.bundle<addr: uint<10>, en: uint<1>, clk: clock, rdata flip: uint<8>, wmode: uint<1>, wdata: uint<8>, wmask: uint<1>>) -> !firrtl.uint<1>
"firrtl.strictconnect"(%10, %4) : (!firrtl.uint<1>, !firrtl.uint<1>) -> ()
%11 = "firrtl.subfield"(%8) {fieldIndex = 2 : i32} : (!firrtl.bundle<addr: uint<10>, en: uint<1>, clk: clock, rdata flip: uint<8>, wmode: uint<1>, wdata: uint<8>, wmask: uint<1>>) -> !firrtl.clock
"firrtl.strictconnect"(%11, %3) : (!firrtl.clock, !firrtl.clock) -> ()
%12 = "firrtl.subfield"(%8) {fieldIndex = 3 : i32} : (!firrtl.bundle<addr: uint<10>, en: uint<1>, clk: clock, rdata flip: uint<8>, wmode: uint<1>, wdata: uint<8>, wmask: uint<1>>) -> !firrtl.uint<8>
%13 = "firrtl.subfield"(%8) {fieldIndex = 4 : i32} : (!firrtl.bundle<addr: uint<10>, en: uint<1>, clk: clock, rdata flip: uint<8>, wmode: uint<1>, wdata: uint<8>, wmask: uint<1>>) -> !firrtl.uint<1>
"firrtl.strictconnect"(%13, %4) : (!firrtl.uint<1>, !firrtl.uint<1>) -> ()
%14 = "firrtl.subfield"(%8) {fieldIndex = 5 : i32} : (!firrtl.bundle<addr: uint<10>, en: uint<1>, clk: clock, rdata flip: uint<8>, wmode: uint<1>, wdata: uint<8>, wmask: uint<1>>) -> !firrtl.uint<8>
"firrtl.strictconnect"(%14, %1) : (!firrtl.uint<8>, !firrtl.uint<8>) -> ()
%15 = "firrtl.subfield"(%8) {fieldIndex = 6 : i32} : (!firrtl.bundle<addr: uint<10>, en: uint<1>, clk: clock, rdata flip: uint<8>, wmode: uint<1>, wdata: uint<8>, wmask: uint<1>>) -> !firrtl.uint<1>
"firrtl.strictconnect"(%15, %0) : (!firrtl.uint<1>, !firrtl.uint<1>) -> ()
%16 = "firrtl.bits"(%12) {hi = 1 : i32, lo = 0 : i32} : (!firrtl.uint<8>) -> !firrtl.uint<2>
%17 = "firrtl.bits"(%7) {hi = 9 : i32, lo = 0 : i32} : (!firrtl.uint<32>) -> !firrtl.uint<10>
%18 = "firrtl.node"(%17) {annotations = [], name = "_T", nameKind = #firrtl<name_kind interesting_name>} : (!firrtl.uint<10>) -> !firrtl.uint<10>
"firrtl.strictconnect"(%9, %18) : (!firrtl.uint<10>, !firrtl.uint<10>) -> ()
"firrtl.strictconnect"(%10, %2) : (!firrtl.uint<1>, !firrtl.uint<1>) -> ()
"firrtl.strictconnect"(%11, %arg0) : (!firrtl.clock, !firrtl.clock) -> ()
"firrtl.strictconnect"(%15, %4) : (!firrtl.uint<1>, !firrtl.uint<1>) -> ()
%19 = "firrtl.tail"(%6) {amount = 6 : i32} : (!firrtl.uint<8>) -> !firrtl.uint<2>
"firrtl.strictconnect"(%16, %19) : (!firrtl.uint<2>, !firrtl.uint<2>) -> ()
}) {annotations = [], parameters = [], portAnnotations = [], portDirections = 0 : i3, portNames = ["clock", "reset", "io"], portSyms = [], portTypes = [!firrtl.clock, !firrtl.uint<1>, !firrtl.bundle<addr: uint<32>, wdata: uint<8>>], sym_name = "Ram"} : () -> () This might help explain what should be happening: circt/lib/Dialect/FIRRTL/Transforms/LowerCHIRRTL.cpp Lines 541 to 547 in daddbc6
To get this working, I think that you need to do two things:
|
During expand whens, if we form a mux with invalid and another value, we choose the other value. E.g. Edit: There is a bunch of discussion about this above that I missed while I was on vacation. If we're just plugging in 0 for invalid, then maybe the first example is incorrect? First example seems fine:
module Bits(
input p,
output [7:0] o);
assign o = 8'h1;
endmodule Second one looks like the invalid is lowered to 0:
module Bits(
input p,
output [7:0] o);
assign o = {7'h0, p};
endmodule |
I think this behavior is caused by the invalid value optimizer seeing a bits of an invalid value, rather than the invalid value directly, so it doesn't apply the optimization. We could add a canonicalization within the pass transforming |
I'm not sure if this behavior is desirable, but I have made an implementation using the canonicalization approach here: zyedidia@0cd42fe. It also required implementing a special version of |
The lack of when optimization here is probably safe given that this is totally new code. However, it would be ideal, as @youngar states, to be consistent with the existing invalid interpretations. Namely, an invalid in a when path means "choose the other path unconditionally" while it means zero in most other contexts. Not being consistent here means that designs that use a vector of bools will have different behavior from a uint with bit selects. These should be equivalent. I also believe that the SFC interpretation @ekiwi wrote is now also wrong in the same way. |
Sounds good, I have added the optimization and a test for it. |
Status? |
I think if there is interest in having this feature, chipsalliance/firrtl-spec#26 needs to be merged first. I am not in a position to decide what (if anything) needs to change there though, or to perform the actual merge (I can rebase the update though if desired). If that PR is merged (or confirmed that it will be merged), then I think this can be rebased (it seems like ExpandWhens has been refactored a bit since this was opened, but hopefully nothing too drastic). I'm not very familiar with the new reference types (seems to be a source of some merge conflicts), so I'm not sure if the interaction with subword assignment would cause problems in the current proposal/implementation. If there isn't consensus in favor of this feature/approach, or if significant changes are needed (warranting a redesign of the proposal/implementation), then we can close this PR and the spec PR and just refer back to them if a future version of subword assignment is proposed (or possibly re-use some code if applicable). |
This implements subword assignment by allowing the
BitsPrimOp
as a sink and then rewriting connections to it during the expand-whens pass.When a value is subword-assigned without having a previous connection, an "uninitialized" wire is created to represent the previous value. During initialization checking, the checker makes sure that no uninitialized wires are reachable from any connection in the module. Some additional canonicalizations of the bits primop are needed to make sure that uses of uninitialized wires are eliminated if all bits are assigned. The canonicalizations are implemented in
ExpandWhens
because they are necessary for the pass. I also added them toFIRRTLFolds.cpp
because they might be generally useful, but I'm not sure how to use the ones fromFIRRTLFolds
during expand-whens because I couldn't figure out how to run canonicalizations during a pass rather than as a separate pass. There's probably a better solution here, so let me know if there's something I should change.The parser has also been modified to support new syntax for
BitsPrimOp
:x[hi:lo]
is equivalent tobits(x, hi, lo)
, andx[i]
is equivalent tobits(x, i, i)
. I'm not sure if I made this modification in the best way, or if we should use this new syntax at all, rather than just supportingbits(x, hi, lo) <=
(currently this syntax is also supported). It uses the subaccess cache for the single bit index, but not for the range index.This implementation does not accept any combinational loops.
Let me know what you think, thanks!